Skip to content

fix: fall back to CPU when CUDA is requested but unavailable (#216)#244

Open
ousamabenyounes wants to merge 1 commit intomicrosoft:mainfrom
ousamabenyounes:fix/issue-216
Open

fix: fall back to CPU when CUDA is requested but unavailable (#216)#244
ousamabenyounes wants to merge 1 commit intomicrosoft:mainfrom
ousamabenyounes:fix/issue-216

Conversation

@ousamabenyounes
Copy link
Copy Markdown

What does this PR do?

Fixes #216

PromptCompressor defaulted to device_map=\"cuda\". On a CPU-only machine (e.g. Windows with torch --index-url .../whl/cpu), instantiating the compressor immediately fails with AssertionError: Torch not compiled with CUDA enabled, forcing every user to pass device_map=\"cpu\" manually and making the copy-paste examples in the README unusable out of the box.

This PR makes load_model check torch.cuda.is_available() when a CUDA device map is requested. If CUDA isn't available, it emits a RuntimeWarning and falls back to \"cpu\" transparently, so existing code with the default device_map=\"cuda\" keeps working on GPU machines and no longer crashes on CPU-only machines.

Changes

File Change
llmlingua/prompt_compressor.py Guard device_map=\"cuda\" with torch.cuda.is_available() and fall back to \"cpu\" with a RuntimeWarning. Also adds a warnings import.
tests/test_issue_216.py New regression tests (3) using monkeypatching on torch.cuda.is_available and AutoConfig / AutoTokenizer / AutoModel* — they run in ~4s and require no network or model download.

Behavior matrix

device_map torch.cuda.is_available() Before After
\"cuda\" (default) True runs on CUDA runs on CUDA (unchanged)
\"cuda\" (default) False \u274c AssertionError runs on CPU + RuntimeWarning
\"cpu\" explicit False runs on CPU runs on CPU (unchanged)

Before submitting

Verification

  • Baseline: 2 tests pass, 3 tests fail with a pre-existing `ValueError: too many values to unpack (expected 2)` in `iterative_compress_prompt` (transformers DynamicCache API change — unrelated to this PR).
  • Post-fix: same 2 baseline tests still pass, 3 new tests pass, same 3 pre-existing failures — zero regressions.

Generated by Claude Code
Vibe coded by ousamabenyounes

…ft#216)

PromptCompressor defaulted to device_map="cuda" even when torch was
built without CUDA support, producing "AssertionError: Torch not
compiled with CUDA enabled" on Windows / CPU-only machines. The fix
transparently falls back to "cpu" (with a RuntimeWarning) when "cuda"
is requested but torch.cuda.is_available() is False, so the default
still works out of the box on CPU-only installs.

Generated by Claude Code
Vibe coded by ousamabenyounes

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] LLMLingua fails on Windows with PyTorch CPU: "Torch not compiled with CUDA enabled"

1 participant